Learning to Extract Folktale Keywords
نویسندگان
چکیده
Manually assigned keywords provide a valuable means for accessing large document collections. They can serve as a shallow document summary and enable more efficient retrieval and aggregation of information. In this paper we investigate keywords in the context of the Dutch Folktale Database, a large collection of stories including fairy tales, jokes and urban legends. We carry out a quantitative and qualitative analysis of the keywords in the collection. Up to 80% of the assigned keywords (or a minor variation) appear in the text itself. Human annotators show moderate to substantial agreement in their judgment of keywords. Finally, we evaluate a learning to rank approach to extract and rank keyword candidates. We conclude that this is a promising approach to automate this time intensive task.
منابع مشابه
Folktale Classification Using Learning to Rank
We present a learning to rank approach to classify folktales, such as fairy tales and urban legends, according to their story type, a concept that is widely used by folktale researchers to organize and classify folktales. A story type represents a collection of similar stories often with recurring plot and themes. Our work is guided by two frequently used story type classification schemes. Cont...
متن کاملProppLearner: Deeply annotating a corpus of Russian folktales to enable the machine learning of a Russian formalist theory
I describe the collection and deep annotation of the semantics of a corpus of Russian folktales. This corpus, which I call the ‘ProppLearner’ corpus, was assembled to provide data for an algorithm designed to learn Vladimir Propp’s morphology of Russian hero tales. The corpus is the most deeply annotated narrative corpus available at this time. The algorithm and learning results are described e...
متن کاملSupporting the Exploration of Online Cultural Heritage Collections: The Case of the Dutch Folktale Database
This paper demonstrates the use of a user-centred design approach for the development of generous interfaces/rich prospect browsers for an online cultural heritage collection, determining its primary user groups and designing different browsing tools to cater to their specific needs. We set out to solve a set of problems faced by many online cultural heritage collections. These problems are lac...
متن کاملBibliometric Networks on Analyze Flipped Learning Research
Aim: The purpose is to provide a comprehensive overview of the current state of research in the field of flipped learning and classroom. It is a science metrics attempt to extract and analyze bibliographic networks based on the international scientific indexing (ISI) Methodology: Systematic search technique was applied: A set of scientific productions indexed in the field of flipped learning an...
متن کاملmark Alan Finlayson inferring Propp ’ s Functions from Semantically Annotated text
Vladimir Propp’s morphology of the Folktale is a seminal work in folkloristics and a compelling subject of computational study. I demonstrate a technique for learning Propp’s functions from semantically annotated text. Fifteen folktales from Propp’s corpus were annotated for semantic roles, co-reference, temporal structure, event sentiment, and dramatis personae. I derived a set of merge rules ...
متن کامل